Calibrated Imputation of Numerical Data under Linear Edit Restrictions
نویسندگان
چکیده
A common problem faced by statistical institutes is that data may be missing from collected datasets. The typical way to overcome this problem is to impute the missing data. The problem of imputing missing data is complicated by the fact that statistical data often have to satisfy certain edit rules and that values of variables across units sometimes have to sum up to known totals. For numerical data, edit rules are most often formulated as linear restrictions on the variables. For example, for data on enterprises edit rules could be that the profit and costs of an enterprise should sum up to its turnover and that the turnover should be at least zero. The totals of some variables across units may already be known from administrative data (e.g. turnover from a tax register) or estimated from other sources. Standard imputation methods for numerical data as described in the literature generally do not take such edit rules and totals into account. In this article we describe algorithms for imputing missing numerical data that take edit restrictions into account and ensure that sums are calibrated to known totals. These algorithms are based on a sequential regression approach that uses regression predictions to impute the variables one by one. To assess the performance of the imputation methods a simulation study is carried out as well as an evaluation study based on a real dataset.
منابع مشابه
General Methods and Algorithms for Modeling and Imputing Discrete Data under a Variety of Constraints
Loglinear modeling methods have become quite straightforward to apply to discrete data X. The models for missing data involve minor extensions of hot-deck methods (Little and Rubin 2002). Edits are structural zeros that forbid certain patterns. Winkler (2003) provided the theory for connecting edit with imputation. In this paper, we give methods and algorithms for modeling/edit/imputation under...
متن کاملDiagnostic Measures in Ridge Regression Model with AR(1) Errors under the Stochastic Linear Restrictions
Outliers and influential observations have important effects on the regression analysis. The goal of this paper is to extend the mean-shift model for detecting outliers in case of ridge regression model in the presence of stochastic linear restrictions when the error terms follow by an autoregressive AR(1) process. Furthermore, extensions of measures for diagnosing influential observations are ...
متن کاملCANCEIS Experiments of Edit and Imputation with 2006 Census Test Data
In this report, we demonstrate the CANCEIS (CANadian Census Edit and Imputation System) experiments of edit and imputation with the 2006 test data. The major effort is to translate the if-then-else rules of current edit and imputation system of the decennial census into the decision logic tables (DLT) of CANCEIS. We also formulate the input files that are needed to run the CANCEIS software. The...
متن کاملResults of Evaluation of AGGIES for ACES
The U. S. Census Bureau’s Annual Capital Expenditures Survey (ACES) collects data about domestic capital expenditures in non-farm businesses operating within the United States. Analysts manually edit the ACES data using a specified set of editing rules. Although individual edits are straightforward, the hierarchical combination of edits are complicated with several nested levels of simultaneous...
متن کاملKernel Ridge Estimator for the Partially Linear Model under Right-Censored Data
Objective: This paper aims to introduce a modified kernel-type ridge estimator for partially linear models under randomly-right censored data. Such models include two main issues that need to be solved: multi-collinearity and censorship. To address these issues, we improved the kernel estimator based on synthetic data transformation and kNN imputation techniques. The key idea of this paper is t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013